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Tl - SPEECH DETECTING DEVICE 
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Tl - Audio detector for intercoms used at residence, office, updates 
assessed value for noise power estimation in noise power 
estimation unit based on judgment of reference signal to be an 
audio signal 
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PA - (MATW ) MATSUSHITA ELECTRIC WORKS LTD 
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AB - JP2000305579 NOVELTY - A reference signal is extracted from 

communication channel and power of audio component and noise 
component are estimated in power and noise estimation units \ ,2), 
respectively. Based on the estimated noise power, reference signal 
is judged to be audio signal. Assessed value for noise power 
estimation in noise power estimation unit is updated, based on 
judgment result. 

- USE - For intercoms, telephones used in residence, office, factory. 

- ADVANTAGE - By updating assessed value for noise power 
estimation, the background noise power estimation process is 
suspended when acoustic coupling component included in 
reference signal is large, and hence presence of audio component 
is reference signal can be accurately detected. 

- DESCRIPTION OF DRAWING(S) - The figure shows the block 
diagram of amplifying call machine. 

- Noise estimation units 1,2 
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PA - MATSUSHITA ELECTRIC WORKS LTD 
Tl - SPEECH DETECTING DEVICE 

AB - PROBLEM TO BE SOLVED: To precisely detect whether a 
reference signal contains a speech signal or not in a system 
presenting a large acoustic coupling gain or under a condition with 
a high background noise level. 
- SOLUTION: A background noise power estimating and switching 
part 4 stops an estimation processing in a background noise power 
estimating part 2 when the ratio of an acoustic coupling gain 
component contained in a reference signal Vx.is large. This makes 
a background noise power estimation value Pn becomes close to a 
value approximating a background noise power around a 
microphone 10 regardless of the phone call state. Thus, this device 
reduces a deterioration of a speech detection performance caused 
by a turn-around of a background noise on a far side to the 
microphone 10 in a system with the microphonelO and a speaker 
1 1 in a short distance causing a large acoustic coupling gain, and 
precisely detects whether a reference signal contains a speech 
signal or not under a condition with a high background noise level. 
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* NOTICES * 



Japan Patent Office is not responsible for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2. **** shows the word which can not be translated. 
3 In the drawings, any words are not translated. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] , 

[The technical field to which invention bel6ngs] f this;-invention relates to the voice detector for 
carrying a normal-mode-rejection function,' a V^VfSia&g^^^tiOT^ctc.m the speaking circuit in 
the **** telephone call equipments (an interphone, ^leplSoM^I^^^^^ residence, an 
office, works, etc. " — 

[0002] 

[Description of the Prior Art] Generally, a voice detector is used in order to detect whether the 
acoustic signal collected with the microphone contains the sound signal. The typical example of 
composition of such a voice detector is shown in drawing 5 . This voice detector VD' is equipped 
with the instant power presumption section 20, the background-noise power presumption section 21, 
and the comparison-test section 22. An integrating circuit or a digital filter with the property with 
loose falling that a standup is steep and etc. is realized, and the instant power presumption section 20 
presumes the short-time average power of a reference signal (acoustic signal collected with a 
microphone). Moreover, an integrating circuit or a digital filter with a property with falling steep 
gently [ the background-noise power presumption section 21 / a standup ] etc. is realized, and the 
background-noise (background noise) level which exists regularly in a reference signal is presumed. 
Furthermore, by comparing with a predetermined threshold the ratio of the instant power estimate 
calculated by the instant power presumption section 20, and the background-noise power estimate 
calculated by the background-noise power presumption section 21, the comparison-test section 22 
judges whether the reference signal contains the sound signal (detection), and outputs the binary 
signal (detecting signal) of H or L. 
[0003] 

[Problem(s) to be Solved by the Invention] However, when preparing above voice detector VD' in 
the internal circuitry of a **** telephone call machine (not shown) which has a loudspeaker and a 
microphone, the wraparound component (acoustic turnover component) from a loudspeaker is 
contained in the acoustic signal collected with a microphone. When the rate of the wraparound 
component contained in this acoustic signal is large, it becomes difficult to detect whether the 
speaker who is near the microphone which is the original purpose uttered voice. For example, the 
background-noise level near the telephone call terminal by the side of a far edge is large, and when a 
microphone collects the background noise by the side of a far edge through an acoustic turnover, the 
background-noise power estimate calculated by the background-noise power presumption section 21 
in the above-mentioned conventional example becomes large. Consequently, also in the state where 
the speaker who is near the microphone uttered voice, the ratio of instant power estimate and 
background-noise power estimate is small, and there is a possibility that it may be incorrect-detected • 
as not being a sound signal (non-voice) in the comparison-test section 22, without the ability 
exceeding a predetermined threshold. . 

[0004] Succeeding in this invention in view of the above-mentioned problem, the place made into the 
purpose is to offer the voice detector which can detect with a sufficient precision whether the sound 
signal is contained in the reference signal under the situation that background-noise level is large. 
[0005] 

[Means for Solving the Problem] It is used for the above-mentioned **** telephone call terminal of 
http://www4.ipdl.jpo.gojp/cgi-bin/tran_web_cgi_ejje' 04/06/2003 
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the **** telephone call system which the **** telephone call terminal which has a microphone and a 
loudspeaker is connected to other telephone call terminals or **** telephone call terminals, and 
performs a half-duplex telephone call in order that invention of a claim 1 may attain the above- 
mentioned purpose. The instant power presumption section which presumes the instant power of the 
reference signal which is the voice detector which detects whether the signal transmitted to a channel 
is a sound signal, or it is a non-voice signal, and was taken out from the above-mentioned channel, 
The background-noise power presumption section which presumes the power of the background- 
noise component contained in the above-mentioned reference signal, While judging whether a 
reference signal is a sound signal or it is a non-voice signal based on the background-noise power 
estimate presumed in the instant power estimate and the above-mentioned background-noise power 
presumption section which are presumed in the above-mentioned instant power presumption section 
The 1st voice / non-voice judging section which holds the last judgment result until a judgment result 
is updated, It is characterized by having the background-noise power presumption change section 
which changes updating/halt of the background-noise power estimate in the above-mentioned 
background-noise power presumption section. When the rate of the acoustic turnover component 
contained in a reference signal is large, while suspending processing of the background-noise power 
presumption section by the background-noise power presumption change section Since the judgment 
by the 1st voice / non-voice judging section is performed 6as J e^^theJ)ackground-noise power 
estimate which asked in the situation that the rate of the acoustic'tu'^ contained m a 

reference signal is small, and the background-noise power presumption secti6ri;held; When the 
background noise in the telephone call terminal by the side of a far edge is sent out from a 
loudspeaker and turns to a microphone The situation that it becomes impossible to detect a sound 
signal even if the ratio of the voice component which the near end side speaker in the acoustic signal 
which a microphone collects emits, and the other background-noise component originates in a bird 
clapper small and the sound signal is contained in the reference signal can be reduced. It is detectable 
with a sufficient precision whether a reference signal is a sound signal under the situation that 
background-noise level is large. 

[0006] voice the non-detecting section where a reference signal finds voice the non-detectmg 
duration detected as it is a non-voice signal according to the judgment result according [ invention of 
a claim 2 / on invention of a claim 1, and ] to the voice / non- voice judging section of the above 1st -- 
a time check - with the section It has the 2nd voice / non-voice judging section which judges any of 
a sound signal and a non-voice signal the above-mentioned reference signals are from voice the non- 
detecting duration found by the section, this voice non-detecting section - a time check - Voice the 
non-detecting duration found by the section is abbreviation regularity over time to be the phoneme 
duration grade of human being's voice, this - the 2nd voice / non-voice judging section - the above- 
mentioned voice non-detecting section - a time check - And it is characterized by judging all the 
reference signals of this voice non-detecting duration to be a sound signal, and changing, when the 
above-mentioned voice non-detecting duration is the pitch period grade of human being's voice. 
[ when a sound signal is not detected in the 1st voice / non- voice judging section under the situation 
that background-noise level is very large ] voice the non-detecting section - a time check -- almost 
uniformly, in being almost equal to the pitch interval of human being's voice, while voice the non- 
detecting duration measured in the section is an audio phoneme duration grade Since a reference 
signal is anew judged as a sound signal in the 2nd voice / non- voice judging section, in a **** 
telephone call system with large background-noise level, a sound signal can be detected with a still 
more sufficient precision. 

[Embodiments of the Invention] (Operation gestalt 1) Drawing 1 is the block diagram showing the 
**** telephone call machine M which has the voice detector VD1 in the operation gestalt 1 of this 
invention This **** telephone call machine M is equipped with a microphone 10, a loudspeaker 1 1, 
the microphone amplifier 15, the loudspeaker amplifier 1 6, the voice detector VD 1 , and voice switch 
VS and is connected with other **** telephone call machines etc. through a circuit. Voice switch 
VShere The acoustic turnover from a loudspeaker 1 1 to a microphone 10, And it is what oppresses a 
howling by reducing the gain of the closed loop formed of the wraparound by the side of a circuit. 
The transmission side attenuator 12 inserted on the transmission signal line for transmitting the 
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sound signal (transmission signal) which collects a sound with a microphone 10 to a circuit, It-has 
the receiver side attenuator 13 inserted on the receiver signal line for transmitting the sound signal 
(receiver signal) which received from the circuit to a loudspeaker 1 1, and the amount control section 
14 of insertion losses which controls the gain of the transmission side attenuator 12 and the receiver 
side attenuator 13 according to a talk state. It **, and in the amount control section 14 of insertion 
losses, a transmission-and-reception talk signal is observed, a talk state is judged, and the gain of the 
transmission side attenuator 12 and the gain of the receiver side attenuator 13 are appropriately set up 
according to a talk state. 

[0008] The instant power presumption section 1 which presumes the instant power of the reference 
signal (transmission signal) Vx which took out the voice detector VD1 concerning this invention 
from the channel (transmission signal line), The back^ugd.noise^ power presumption section 2 
which presumes the power of the background-noise component' contained; in, the reference signal Vx, 
While judging whether the reference signal Vx is a sound signal or ius f a%pn^ice signal based on 
the background-noise power estimate Pn presumed in the instant power estimate Ps and the 
background-noise power presumption section 2 which are presumed in' the instant power 
presumption section 1 It has the 1st voice / non-voice judging section 3 holding the last judgment 
result, and the background-noise power presumption change section 4 which changes updating/halt 
of the'background-noise power estimate Pn in the background-noise power presumption section 2 
until a judgment result is updated. 

[0009] It is constituted by an integrating circuit or a digital filter etc. in which falling has a loose 
property steeply [ a standup ] as for the instant power presumption section 1.. Moreover, it is 
constituted by the integrating circuit or digital filter in which falling has a steep property gently [ a 
standup ] as for the background-noise power presumption section 2. 

[0010] The comparator CP 1 which, on the other hand, outputs the binary signal Dl of H or L for the 
instant power estimate Ps outputted from the instant power presumption section 1 as the 1st voice / 
non-voice judging section 3 are shown in drawing 2 as compared with the predetermined threshold 
PsO the ratio of the instant power estimate Ps and the background-noise power estimate Pn outputted 
from the background-noise power presumption section 2 - with divider 3a which calculates Ps/Pn It 
is constituted by the comparator CP 2 which outputs the binary signal D2 of H or L for output-value 
Ps/Pn of divider 3a as compared with predetermined threshold delta, and AND-operation section 3b 
which searches for the AND of two binary signals Dl and D2. The instant power estimate Ps is 
larger than a threshold PsO (Ps>Ps0), and in this operation gestalt, it **, when output Ps/PsO of 
divider 3a is larger than threshold delta (Ps/Ps0>delta), it judges with a sound signal, and in the case 
of others, it judges with a non- voice signal. Here, a threshold PsO is a threshold which specifies the 
minimum level of a sound signal, and threshold delta is a threshold which specifies the minimum 
ratio of sound signal level and back- ground-noise level. 

[001 1] The background-noise power presumption change section 4 is turned on and off by the 
control signal Vs outputted from the amount control section 14 of insertion losses of voice switch 
VS, and the input of the reference signal Vx over the background-noise power presumption section 2 
consists of close / a switch which carries out OFF. And the background-noise power presumption 
section 2 serves as the update mode, when the background-noise power presumption change section 
4 turns on and the reference signal Vx is inputted, and when the background-noise power 
presumption change section 4 turns off and the reference signal Vx is not inputted, it serves as halt 
mode. In the update mode, the background-noise power presumption section 2 updates the 
background-noise power estimate Pn serially with reference to the reference signal Vx here. 
Moreover, in halt mode, the background-noise power presumption section 2 suspends the above- 
mentioned data processing, and holds the value calculated before it as background-noise power 
estimate Pn. 

[0012] Here, when a talk state was judged to be a receiver state, while the amount control section 14 
of insertion losses of voice switch VS turned off the background-noise power presumption change 
section 4 with the control signal Vs, when it judges with a transmission state, it turns on the 
background-noise power presumption change section 4 with a control signal Vs. Since it **, it 
becomes halt mode in a receiver state in the background-noise power presumption section 2 and it 
becomes the update mode in a transmission state, it sets to the voice detector VD1. When the rate of 



http://www4.ipdl.jpo.go.jp/cgi-bin/tran_web_cgi_ejje 



04/06/2003 



This Page Blank luspto) 



the acoustic turnover component contained in the reference signal Vx is large, by suspending 
presumed processing of the background-noise power presumption section 2, it can consider as the 
value which approximated the background-noise power of the microphone 10 circumference, without 
depending the background-noise power estimate Pn on a talk state. Consequently, the distance 
between a microphone 10 and a loudspeaker 1 1 is short, and degradation of the voice detectability 
ability by the background noise by the side of a far edge turning to a microphone 10 also in a system 
with large acoustic turnover gain can be reduced. In addition, the detecting signal (detection flag) SD 
1 of the voice detector VD1 is given to for example, voice switch VS, and is used for various control. 

[0013] When the rate of the acoustic •tumo^ in the reference signal Vx is 

large according to the voice detector VDl' applied to this invention as mentioned above, while 
suspending processing of the background-noise power presumption section'. 3 j by the background- 
noise power presumption change section 4 Since the judgment by the' 1st voice / non-voice judging 
section 4 is performed based on the background-noise power estimate Pn which asked in the 
situation that the rate of the acoustic turnover component contained in the reference signal Vx is 
small, and the background-no ise power presumption section 2 held, When the background noise in 
the telephone call terminal by the side of a far edge is sent out from a loudspeaker 1 1 and turns to a 
microphone 10 The situation that it becomes impossible to detect a sound signal even if the ratio of 
the voice component which the near end side speaker in the acoustic signal which a microphone 10 
collects emits, and the other background-noise component originates in a bird clapper small and the 
sound signal is contained in the reference signal Vx can be reduced. Also in a system with a short 
distance of a before [ from a loudspeaker 1 1 / a microphone 10 ], and large acoustic turnover gain, it 
is detectable with a sufficient precision whether the reference signal Vx is a sound signal. 
[0014] (Operation gestalt 2) Drawing 3 shows the block diagram of the voice detector VD2 in the 
operation gestalt 2 of this invention. However, since the fundamental composition of this operation 
gestalt is common in the operation gestalt 1 , the same sign is given to common composition and 
explanation is omitted. 

[0015] This operation gestalt is based on the detecting signal SD 1 of the 1st voice / non- voice 
judging section 3. the time taul and tau2 judged as the reference signal Vx being a non-voice signal 
as shown in drawing 4 , i.e., time for a detecting signal SD 1 to be L level, (henceforth "voice non- 
detecting duration"), and voice the non-detecting section which asks for - a time check - with the 
section 5 voice the non-detecting section — a time check - the feature is in the point equipped with 
the 2nd voice / non- voice judging section 6 which judges any of a sound signal and a non-voice 
signal the reference signals Vx are based on the voice non-detecting duration taul and tau2, and - 
which were called for by the section 5 

[0016] here voice the non-detecting section - a time check - whenever a detecting signal SD 1 
changes from L to H in the section 5 - a time check -- although processing is reset - the time check 
before it - between the phoneme duration grades in a sound signal holds the result (voice non- 
detecting duration taul --) for storage meanses, such as RAM, at least 

[0017] The voice non-detecting duration taul and tau2 memorized by the storage means of the 
section 5, --, tauN are referred to. moreover ~ the 2nd voice / non-voice section judging section 6 ~ 
voice the non-detecting section — a time check — Over time for these values to be the phoneme 
duration grades of human being's voice, in being abbreviation regularity and being the pitch interval 
grade of human being's voice, these sections tau 1 - tauN are anew detected as the voice section, and 
it outputs the detecting signal SD 2 of H level (refer to drawing 4 ). 

[0018] the ambient noise level [ in / the microphone 10 neighborhood / if it ** and twists in this 
operation gestalt, as it is shown in drawing 4 ] VN - high - the ratio of the instant power estimate 
PS and the background-noise power estimate Pn, since Ps/Pn is small In spite of containing voice in. 
the reference signal Vx, when being judged as a non-voice signal in the 1st voice / non-voice judging 
section 3, it becomes possible to detect as a sound signal anew in the 2nd voice / non-voice judging 
section 6. Consequently, there is an advantage that a sound signal can be detected with a still more 
sufficient precision also in a **** telephone call system with large background-noise level to the 
operation gestalt 1 . * 
[0019] 
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[Effect of the Invention] Invention of a claim 1 is used for the above-mentioned **** telephone call 
terminal of the **** telephone call system which the **** telephone call terminal which has a 
microphone and a loudspeaker is connected to other telephone call terminals or **** telephone call 
terminals, and performs a half-duplex telephone call. The instant power presumption section which 
presumes the instant power of the reference signal which is the voice detector which detects whether 
the signal transmitted to a channel is a sound signal, or it is a non-voice signal, and was taken out 
from the above-mentioned channel, The background-noise power presumption section which 
presumes the power of the background-noise component contained in the. above-mentioned reference 
signal, While judging whether a reference signal-is' a sound signal or it is a non-voice signal based on 
the background-noise power estimate presumed in the instant power estimate and the above- 
mentioned background-noise power presumption section which are presumed in the above- 
mentioned instant power presumption section Since it had the 1st voice / non-voice judging section 
holding the last judgment result, and the background-noise power presumption change section which 
changes updating/halt of the background-noise power estimate in' the . above-mentioned background- 
noise power presumption section until the judgment result was updated When the rate of the acoustic 
turnover component contained in a reference signal is large, while suspending processing of the 
background-noise power presumption section by the background-noise power presumption change 
section Since the judgment by the 1st voice / non- voice judging section is performed based on the 
background-noise power estimate which asked in the situation that the rate of the acoustic turnover 
component contained in a reference signal is small, and the background-noise power presumption 
section held, When the background noise in the telephone call terminal by the side of a far edge is 
sent out from a loudspeaker and turns to a microphone The situation that it becomes impossible to 
detect a sound signal even if the ratio of the voice component which the near end side speaker in the 
acoustic signal which a microphone collects emits, and the other background-noise component 
originates in a bird clapper small and the sound signal is contained in the reference signal can be 
reduced. The effect that it is detectable with a sufficient precision is [ whether a reference signal is a 
sound signal and ] under the situation that background-noise level is large. 
[0020] voice the non-detecting section where a reference signal finds voice the non-detecting 
duration detected as it is a non-voice signal according to the judgment result according [ invention of 
a claim 2 ] to the voice / non- voice judging section of the above 1st — a time check ~ with the 
section It has the 2nd voice / non-voice judging section which judges any of a sound signal and a 
non-voice signal the above-mentioned reference signals are from voice the non-detecting duration 
found by the section, this voice non-detecting section - a time check — Voice the non-detecting 
duration found by the section is abbreviation regularity over time to be the phoneme duration grade 
of human being's voice, this — the 2nd voice / non-voice judging section — the above-mentioned 
voice non-detecting section — a time check — And since all the reference signals of this voice non- 
detecting duration are judged to be a sound signal and it changes when the above-mentioned voice 
non-detecting duration is the pitch period grade of human being's voice [ when a sound signal is not 
detected in the 1st voice / non-voice judging section under the situation that background-noise level 
is very large ] voice the non-detecting section ~ a time check « almost uniformly, in being almost 
equal to the pitch interval of human being's voice, while voice the non-detecting duration measured 
in the section is an audio phoneme duration grade Since a reference signal is anew judged as a sound 
signal in the 2nd voice / non-voice judging section, it is effective in the ability to detect a sound 
signal with a still more sufficient precision in a **** telephone call system with large background- 
noise level. 



[Translation done.] 
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